410 research outputs found
Deep Learning for Semantic Part Segmentation with High-Level Guidance
In this work we address the task of segmenting an object into its parts, or
semantic part segmentation. We start by adapting a state-of-the-art semantic
segmentation system to this task, and show that a combination of a
fully-convolutional Deep CNN system coupled with Dense CRF labelling provides
excellent results for a broad range of object categories. Still, this approach
remains agnostic to high-level constraints between object parts. We introduce
such prior information by means of the Restricted Boltzmann Machine, adapted to
our task and train our model in an discriminative fashion, as a hidden CRF,
demonstrating that prior information can yield additional improvements. We also
investigate the performance of our approach ``in the wild'', without
information concerning the objects' bounding boxes, using an object detector to
guide a multi-scale segmentation scheme. We evaluate the performance of our
approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing
and face labelling respectively. We show superior performance with respect to
competitive methods that have been extensively engineered on these benchmarks,
as well as realistic qualitative results on part segmentation, even for
occluded or deformable objects. We also provide quantitative and extensive
qualitative results on three classes from the PASCAL Parts dataset. Finally, we
show that our multi-scale segmentation scheme can boost accuracy, recovering
segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table
HoloPose: Holistic 3D Human Reconstruction In-The-Wild.
We introduce HoloPose, a method for holistic monocular 3D human body reconstruction. We first introduce a
part-based model for 3D model parameter regression that
allows our method to operate in-the-wild, gracefully handling severe occlusions and large pose variation. We further
train a multi-task network comprising 2D, 3D and Dense
Pose estimation to drive the 3D reconstruction task. For
this we introduce an iterative refinement method that aligns
the model-based 3D estimates of 2D/3D joint positions and
DensePose with their image-based counterparts delivered
by CNNs, achieving both model-based, global consistency
and high spatial accuracy thanks to the bottom-up CNN
processing. We validate our contributions on challenging
benchmarks, showing that our method allows us to get both
accurate joint and 3D surface estimates, while operating
at more than 10fps in-the-wild. More information about
our approach, including videos and demos is available at
http://arielai.com/holopose
Recommended from our members
WCDMA for Air to Ground and Ground to Air communications – Case Study for Greek airspace
In this paper the capacity of the Air-to-Ground and Ground to Air system is studied. The Outside Cell Interference Factor is estimated through simulations for uplink and downlink. The calculation algorithm is explained as well as the scanning mode of interfering cells around the desired cell. Results for the number of active users for various values of the maximum height of the cell and its radius are presented. A Case Study for the Athens, Thessaloniki and Iraklio airports has been made in which the number of the users per cell is being calculated for voice service of 12.2 kbps and video calls of 64 and 128 kbps
Softmesh: Learning Probabilistic Mesh Connectivity via Image Supervision
In this work we introduce Softmesh,a fully differentiable pipeline to transform a 3D point cloud into a probabilistic mesh representation that allows us to directly render 2D images. We use this pipeline to learn point connectivity from only 2D rendering supervision,reducing the supervision requirements for mesh-based representations.We evaluate our approach in a set of rendering tasks,including silhouette,normal,and depth rendering on both rigid and non-rigid objects. We introduce transfer learning approaches to handle the diversity of the task requirements,and also explore the potential of learning across categories. We demonstrate that Softmesh achieves competitive performance even against methods trained with full mesh supervision
Deep Spatio-Temporal Random Fields for Efficient Video Segmentation.
In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely-connected spatio-temporal graph by capitalizing on recent advances on deep Gaussian Conditional Random Fields (GCRFs). Our method, called VideoGCRF is (a) efficient, (b) has a unique global minimum, and (c) can be trained end-to-end alongside contemporary deep networks for video understanding. We experiment with multiple connectivity patterns in the temporal domain, and present empirical improvements over strong baselines on the tasks of both semantic and instance segmentation of videos. Our implementation is based on the Caffe2 framework and will be available at https://github.com/siddharthachandra/gcrf-v3.0
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act like a swiss knife for vision tasks, we call it an UberNet to indicate its overarching nature. The main contribution of this work consists in handling challenges that emerge when scaling up to many tasks. We introduce techniques that facilitate (i) training a deep architecture while relying on diverse training sets and (ii) training many (potentially unlimited) tasks with a limited memory budget. This allows us to train in an end-to-end manner a unified CNN architecture that jointly handles (a) boundary detection (b) normal estimation (c) saliency estimation (d) semantic segmentation (e) human part segmentation (f) semantic boundary detection, (g) region proposal generation and object detection. We obtain competitive performance while jointly addressing all tasks in 0.7 seconds on a GPU. Our system will be made publicly available
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act like a swiss knife for vision tasks, we call it an UberNet to indicate its overarching nature. The main contribution of this work consists in handling challenges that emerge when scaling up to many tasks. We introduce techniques that facilitate (i) training a deep architecture while relying on diverse training sets and (ii) training many (potentially unlimited) tasks with a limited memory budget. This allows us to train in an end-to-end manner a unified CNN architecture that jointly handles (a) boundary detection (b) normal estimation (c) saliency estimation (d) semantic segmentation (e) human part segmentation (f) semantic boundary detection, (g) region proposal generation and object detection. We obtain competitive performance while jointly addressing all tasks in 0.7 seconds on a GPU. Our system will be made publicly available
Deep Filter Banks for Texture Recognition, Description, and Segmentation
Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture represenations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another
- …